Note 1: Covid19 Data at March 30 2020 [1], expected cases extracted from GPW2020 [2]. Bayesian spatial models from [4-6].

Note 2: Due to the lack of covariates that vary in time, risk models have been fitted to the most recent aggregated data for the disease mapping, breaks have been fixed showing real case counts at March 30.

       NAME cases    ses hh.Disab minorities resid.Transport
18 Cuyahoga   449 0.5541   0.5024     0.8354          0.6332
20 Defiance     5 0.2427   0.4432     0.3894          0.1996
22     Erie     5 0.2621   0.5699     0.4333          0.2783
63 Paulding    NA 0.3420   0.6558     0.2951          0.2197
69   Putnam    NA 0.0618   0.1722     0.2677          0.0108
88  Wyandot     1 0.2299   0.6603     0.3967          0.1477

Disease mapping

Cases (30-03-2020)

Exploring the risk of Ohio counties (CDC’s Vulnerability Index)

Relative risk maps generated with INLA.

Vulnerability evaluation

There are no P values in Bayes. Importance or significance of variables can be deduced by examining the overlap of their 2.5% and 97.5% posterior estimates with zero.

Loading required package: MCMCglmm
Warning in library(package, lib.loc = lib.loc, character.only = TRUE,
logical.return = TRUE, : there is no package called 'MCMCglmm'

Evaluating clusters of covid19 in Ohio during the first month of pandemia

In this case, we will use the inverse of the distance to the 3 main airports of Ohio as a covariate as no other information about the areas is available. Given that coordinates are expressed in longitude and latitude great circle distances are used. Inverse distance to these places can be used to test for increased risk in the areas around the airports. For the methodology, Jung and Zhang shown [8, 9] a link between GLMs and Spatial Scan Statistics by Kulldorf [10]. The idea is to use of a dummy variable which is 1 for the areas in the cluster and 0 for the areas outside the cluster. Jung discussed how to extend model-based approaches for the detection of spatial disease clusters to space and time [8].

First, consider a Poisson model with expected counts \(E_{i}\) and observed cases \(O_{i}\) modeled as:

\(O_{i} \sim P_{0}(E_{i}\theta_{i})\)

\(log(\mu_{i,t})=log(E_{i,t})+\alpha+\beta x_{i}\)

where \(\mu_{i,t}\) is the mean of a county who follows a Poisson distribution equal to \(E_{i}\theta_{i}\). \(x_{i}\) represents a covariate of the outcome of interest and \(\theta_{i}\) is the relative risk and it measures deviation in the incidence of covid19 from the expected number of cases. An estimate of the relative risk that does not require covariates is the standardized incidence ratio (SIR), and is defined as \(O_{i}/E_{i}\) [11].

After fitting this model, Gómez-Rubio et al. [12] propose adding cluster covariates as follows:

\(log(\mu_{i,t})=log(E_{i,t})+\alpha+\beta x_{i}+\gamma_{j}C^{(j)}_{i,t}\)

where \(C^{(j)}_{i,t}\) denotes a dummy variable associated with cluster \(j\), with \(j\) taking values from 1 to the number of clusters. Finally, a zero-inflated distribution (ZIP) is considered to assess \(0\) covid cases in some counties. The package DClusterm was used for the cluster detection [12].

Cluster detection after adjusting with distances to Ohio’s airports

Model output


Call:
zeroinfl(formula = cases ~ offset(log(E)) + ID.CLE + ID.CVG + ID.CMH | 
    ID.CLE + ID.CVG + ID.CMH, data = mystfdf, dist = "poisson", 
    x = TRUE)

Pearson residuals:
     Min       1Q   Median       3Q      Max 
-3.69706 -0.38293 -0.18822 -0.05094 16.30579 

Count model coefficients (poisson with log link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept) -0.25136    0.06496  -3.869 0.000109 ***
ID.CLE      20.45994    1.33958  15.273  < 2e-16 ***
ID.CVG      -4.40413    2.60997  -1.687 0.091521 .  
ID.CMH       5.02735    0.95852   5.245 1.56e-07 ***

Zero-inflation model coefficients (binomial with logit link):
            Estimate Std. Error z value Pr(>|z|)    
(Intercept)     2.48       0.58   4.275 1.91e-05 ***
ID.CLE       -277.96      67.16  -4.139 3.49e-05 ***
ID.CVG        -99.66      29.09  -3.426 0.000614 ***
ID.CMH        -35.16      12.67  -2.775 0.005526 ** 
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1 

Number of iterations in BFGS optimization: 392 
Log-likelihood: -1133 on 8 Df

List of non-overlapped clusters:

                    x        y size      minDateCluster
Mahoning41  -80.77631 41.01464    3 2020-03-16 20:00:00
Greene75    -83.88989 39.69147    9 2020-03-20 20:00:00
Lucas109    -83.65850 41.61987   14 2020-03-24 20:00:00
Cuyahoga7   -81.65864 41.42447    6 2020-03-14 20:00:00
Franklin103 -83.00930 39.96954    5 2020-03-23 20:00:00
                 maxDateCluster statistic       pvalue      risk cluster
Mahoning41  2020-03-28 20:00:00 25.509364 9.149348e-13 0.7130359    TRUE
Greene75    2020-03-20 20:00:00 15.389906 2.890292e-08 2.7773964    TRUE
Lucas109    2020-03-28 20:00:00 10.381898 5.195593e-06 0.4913647    TRUE
Cuyahoga7   2020-03-21 20:00:00  3.350700 9.633724e-03 0.2049237    TRUE
Franklin103 2020-03-27 20:00:00  3.249359 1.079523e-02 0.1789636    TRUE
            alpha_bonferroni
Mahoning41      2.083333e-05
Greene75        2.083333e-05
Lucas109        2.083333e-05
Cuyahoga7       2.083333e-05
Franklin103     2.083333e-05

Most significant spatio-temporal clusters of covid19 detected in Ohio.

FINDINGS

REFERENCES

[1] Data from The New York Times, based on reports from state and local health agencies. https://www.nytimes.com/interactive/2020/us/coronavirus-us-cases.html.

[2] Center for International Earth Science Information Network - CIESIN - Columbia University, United Nations Food and Agriculture Programme - FAO, and Centro Internacional de Agricultura Tropical - CIAT. 2005. Gridded Population of the World, Version 4 (GPWv4.11): Population Count Grid. Palisades, NY: NASA Socioeconomic Data and Applications Center (SEDAC). http://dx.doi.org/10.7927/H4639MPP. Accessed 21 03 2020.

[3] ACS County-to-County Migration Flows 2013-2017. https://www.census.gov/topics/population/migration.html

[4] CDC SVI 2018 Documentation, 1/31/2020. https://svi.cdc.gov/Documents/Data/2018_SVI_Data/SVI2018Documentation.pdf

[5] Moraga, Paula. (2019). Geospatial Health Data: Modeling and Visualization with R-INLA and Shiny. Chapman & Hall/CRC Biostatistics Series.

[6] Spatial and spatio-temporal models with R-INLA. M Blangiardo, M Cameletti, G Baio, H Rue. Spatial and spatio-temporal epidemiology 4, 33-49.

[7] Flanagan, Barry E.; Gregory, Edward W.; Hallisey, Elaine J.; Heitgerd, Janet L.; and Lewis, Brian (2011) “A Social Vulnerability Index for Disaster Management,” Journal of Homeland Security and Emergency Management: Vol. 8: Iss. 1, Article 3. DOI: 10.2202/1547-7355.1792

[8] Jung I (2009). “A Generalized Linear Models Approach to Spatial Scan Statistics for Covariate Adjustment.” Statistics in Medicine, 28(7), 1131–1143. doi:10.1002/sim.3535.

[9] Zhang T, Lin G (2009). “Spatial Scan Statistics in Loglinear Models.” Computational Statistics & Data Analysis, 53(8), 2851–2858. doi:10.1016/j.csda.2008.09.016.

[10] Kulldorff M (1997). “A Spatial Scan Statistic.” Communications in Statistics – Theory and Methods, 26(6), 1481–1496. doi:10.1080/03610929708831995.

[11] Waller LA, Gotway CA (2004). Applied Spatial Statistics for Public Health Data. John Wiley & Sons. doi:10.1002/0471662682.

[12] Gómez-Rubio V, Moraga P, Molitor J (2018). “Fast Bayesian Classification for Disease Mapping and the Detection of Disease Clusters.” In M Cameletti, F Finazzi (eds.), Quantitative Methods in Environmental and Climate Research, pp. 1–27. Springer-Verlag. doi: 10.1007/978-3-030-01584-8_1.

[13] Gómez-Rubio, V., Moraga, P., Molitor, J., & Rowlingson, B. (2019). DClusterm: Model-Based Detection of Disease Clusters. Journal of Statistical Software, 90(14), 1 - 26. doi:http://dx.doi.org/10.18637/jss.v090.i14.